better duplicate key stats during index generation #30829
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
See #30711
At startup, scan storages to populate index. We can easily identify pubkeys that are in multiple slots (duplicates).
There are metrics on this, but they have misleading names.
More importantly, the duplicate list needs to include the first slot we encounter that contains a given duplicate pubkey so that clean will pick it up correctly.
Summary of Changes
Rename and add metrics.
When we find the first duplicate, also mark as duplicate the first item that was already added which has now become a duplicate.
Fixes #